Earning for D Eep N Eural N Etworks

نویسندگان

  • Max Kochurov
  • Timur Garipov
  • Dmitry Podoprikhin
  • Dmitry Molchanov
  • Arsenii Ashukha
  • Dmitry Vetrov
چکیده

In industrial machine learning pipelines, data often arrive in parts. Particularly in the case of deep neural networks, it may be too expensive to train the model from scratch each time, so one would rather use a previously learned model and the new data to improve performance. However, deep neural networks are prone to getting stuck in a suboptimal solution when trained on only new data as compared to the full dataset. Our work focuses on a continuous learning setup where the task is always the same and new parts of data arrive sequentially. We apply a Bayesian approach to update the posterior approximation with each new piece of data and find this method to outperform the traditional approach in our experiments. 1 BAYESIAN INCREMENTAL LEARNING Recent work has shown promise in incremental learning; for example, a set of reinforcement learning problems have been successively solved by a single model with a help of weight consolidation (Kirkpatrick et al., 2016) or Bayesian inference (Nguyen et al., 2017). In this work we focus on a specific incremental learning setting – we consider a single fixed task when independent data portions arrive sequentially. We formulate a Bayesian method for incremental learning and use recent advances in approximate Bayesian inference (Kingma & Welling, 2013; Kingma et al., 2015; Louizos & Welling, 2017) to obtain a scalable learning algorithm. We demonstrate the performance of our method on MNIST and CIFAR-10 is improved relative to a naive fine-tuning approach and can be applied to a conventional (non-Bayesian) pre-trained DNN. Consider an i.i.d. datasetD = {xi, yi}i=1. In an incremental learning setting, this dataset is divided into T parts D = {D1, . . . ,DT }, which arrive sequentially during training. The goal is to build an efficient algorithm that takes a model, trained on the first t−1 units of dataD1, . . . ,Dt−1, and retrain it on a new unit of data Dt without access to D1, . . . ,Dt−1 and without forgetting dependencies. The most naive deep learning approach for incremental learning is to apply the Stochastic Gradient Descent (SGD) updates with the same loss function on the new data parts, to fine-tune the model. However, in that case, the model is likely to converge to a local optima on a new data unit without saving the information learned from the previous parts of the data. The Bayesian framework is a powerful tool for working with probabilistic models. It allows to estimate the posterior distribution p(w | D1, . . . ,Dt) over the weights w of the model. We can use the Bayes rule to sequentially update the posterior distribution in the incremental learning setting: p(w | D1, . . . ,Dt) ∝ p(Dt |w)p(w | D1, . . . ,Dt−1) (1) Unfortunately, in most cases the posterior distribution p(w | D1, . . . ,Dt) is intractable, so we can use stochastic variational inference (Hoffman et al., 2012) to approximate it. In the next section we present a scalable method for incremental learning, and study different variational approximations of the posterior distribution.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

H Ardware - Aware E Xponential a Pproximation for D Eep N Eural N Etworks

In this paper, we address the problem of cost-efficient inference for non-linear operations in deep neural networks (DNNs), in particular, the exponential function e in softmax layer of DNNs for object detection. The goal is to minimize the hardware cost in terms of energy and area, while maintaining the application accuracy. To this end, we introduce Piecewise Linear Function (PLF) for approxi...

متن کامل

L Ocal E Xplanation M Ethods for D Eep N Eural N Etworks L Ack S Ensitivity to P Arameter V Al - Ues

Explaining the output of a complicated machine learning model like a deep neural network (DNN) is a central challenge in machine learning. Several proposed local explanation methods address this issue by identifying what dimensions of a single input are most responsible for a DNN’s output. The goal of this work is to assess the sensitivity of local explanations to DNN parameter values. Somewhat...

متن کامل

B Lack - Box a Ttacks on D Eep N Eural N Etworks via G

In this paper, we propose novel Gradient Estimation black-box attacks to generate adversarial examples with query access to the target model’s class probabilities, which do not rely on transferability. We also propose strategies to decouple the number of queries required to generate each adversarial example from the dimensionality of the input. An iterative variant of our attack achieves close ...

متن کامل

Iclr 2018 a Ttention - B Ased G Uided S Tructured S Parsity of D Eep N Eural N Etworks

Network pruning is aimed at imposing sparsity in a neural network architecture by increasing the portion of zero-valued weights for reducing its size regarding energy-efficiency consideration and increasing evaluation speed. In most of the conducted research efforts, the sparsity is enforced for network pruning without any attention to the internal network characteristics such as unbalanced out...

متن کامل

Isk L Andscape a Nalysis for U Nder - Standing D Eep N Eural N Etworks

This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance. In particular, for an l-layer linear neural network ...

متن کامل

Iclr 2018 D Eep S Ensing : a Ctive S Ensing Using M Ulti - Directional R Ecurrent N Eural N Etworks

For every prediction we might wish to make, we must decide what to observe (what source of information) and when to observe it. Because making observations is costly, this decision must trade off the value of information against the cost of observation. Making observations (sensing) should be an active choice. To solve the problem of active sensing we develop a novel deep learning architecture:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018